-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build scrapper passed #73
Conversation
scraper.js scraps from blog.zairza.in info regarding all blog posts
build a scrapper with details to fetch into blogs.ejs
layout of blogs section is made responsive and date mentioned in blog cards is rendered using regex syntax
date used in blog cards are now of the format DD/MM/YYYY
the latest post from medium is now also fetched and added to data.json file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tulsi-prasad Good work, try if what I mentioned above can be done
@@ -0,0 +1,105 @@ | |||
// The scraper for the blog.ejs section in the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you run this file?
routes/data.json
Outdated
"href": "https://blog.zairza.in/oauth-using-mevn-stack-4b4a383dae08?source=collection_home---6------0-----------------------", | ||
"author": "Ramakrishna Pattnaik", | ||
"release": "2019-08-25T12:13:49.122Z", | ||
"cover": "https://cdn-images-1.medium.com/fit/t/1600/480/1*zqCh8ZNR-LjBzaacpiIyUA.png" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are fetching the cover image, it's huge. Which is why it's zoomed out. We need the image which the first image inside the blog. WDYT? Can this be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. We need to scrape every hrefs of particular blog to get the right image. Working on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly we'll do this step for 4 of the recent posts for optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that should reduce unnecessary requests
add moment js as a dependency to work with date time objects
write datetime objects to be rendered using moment package
Scrapes the first image from each blog posts and forms a cover object.
added cover objects with img urls in cover.json file of first 4 blogs
Work to DoFetch the cover image urls from each individual blogs (first-image). For this purpose is taken care of in 567dc39. And also next commit adds them to Problems now facingThe order of scraped image urls is not according to the blog posts order. This seems to appear out of nowhere, as the while loop iterates from Fix to TryI am thinking of making a different EDIT 1Refactored the code in the next commit, 2b837e5. Any suggestions are appreciated. 👍 |
json folder stores the scraped data and scrapcover is fetches cover images from each blogs
bloglinks array contains four urls in order to be scraped for cover image
You should keep the entire logic in one file. When you fetch the blog url for cover image, do another request using cheerio. async/await will help. |
This PR is taken further in #74 to avoid any local conflicts. |
Have worked on a new branch. Will update in a another PR. |
Description
This PR is WIP (work in progress), focused to solve #68 . The
scraper.js
built within router directory contains the code to fetch data fromzairza.blog.in
. It fetchestitle, href, author, release-date, and cover-img-link
of all the blog posts and stores them as objects.Dependencies Added
Work remaining todo:
zairza.blog.in
and are stored as objectsapp.js
fileblog.ejs
for rendering the blog sectionsblogs.json
at the end to fetch scraped data to our websiteAll suggestions are appreciated. 👍